Non-Disjoint Discretization for Naive-Bayes Classifiers

نویسندگان

Ying Yang

Geoffrey I. Webb

چکیده

Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of an interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naiveBayes classifiers, NDD works better than alternative discretization approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Disjoint Discretization for Aggregating One-Dependence Estimator Classifiers

There is still lack of clarity about the best manner in which to handle numeric attributes when applying Bayesian network classifiers. Discretization methods entail an unavoidable loss of information. Nonetheless, a number of studies have shown that appropriate discretization can outperform straightforward use of common, but often unrealistic parametric distribution (e.g. Gaussian). Previous st...

متن کامل

Proportional k-Interval Discretization for Naive-Bayes Classifiers

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus...

متن کامل

On Why Discretization Works for Naive-Bayes Classifiers

We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under ...

متن کامل

Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers

The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analys...

متن کامل

Discretizing Continuous Features for Naive Bayes and C4.5 Classifiers

In this work, popular discretization techniques for continuous features in data sets are surveyed, and a new one based on equal width binning and error minimization is introduced. This discretization technique is implemented for the UCI Machine Learning Repository [7] dataset, Adult database and tested on two classifiers from WEKA tool [6], NaiveBayes and J48. Relative performance changes for t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Non-Disjoint Discretization for Naive-Bayes Classifiers

نویسندگان

چکیده

منابع مشابه

Non-Disjoint Discretization for Aggregating One-Dependence Estimator Classifiers

Proportional k-Interval Discretization for Naive-Bayes Classifiers

On Why Discretization Works for Naive-Bayes Classifiers

Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers

Discretizing Continuous Features for Naive Bayes and C4.5 Classifiers

عنوان ژورنال:

اشتراک گذاری